Upgrade CSP to Perspective 3.x #370
Closed
+142
−65
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi I'm Davis! I work at @ProspectiveCo, we maintain Perspective. We have been working alongside some people at Cubist (@ptomecek) and we are interested in upgrading CSP to use the new Perspective 3.0 API. Overall, the transition was quite simple, having to make some API changes around Table creation. The most issues were found in the
pandas_perspective
adapter, which seems deprecated in favor of theperspective
adapter. I would like to remove it completely, but left it here for your review.Specific modifications done to CSP for this change:
csp/dataframe.py
: Theto_perspective
function now must take aClient
object to construct theTable
.csp/adapters/perspective.py
: The use ofPerspectiveManager
was changed to use of the newClient
/Server
classes.csp/impl/pandas_perspective.py
: Perspective Table JSON update does not support directdate
ordatetime
values now, as theses are not JSON serializable types. To work around this, the objects are directly translated to timestamp integers and sent to Perspective to be parsed into Perspective’s time types.to_df
was rewritten to route throughPyArrow
, but due to differences in how PyArrow chooses dtypes, there is some hackery around ordering of categories. We also removedto_dict
andto_numpy
because they were removed in the Perspective 3.0 migration.csp/tests/impl/test_pandas_perspective.py
many tests were fixed to demonstrate changes that are needed to continue day-to-day use ofCspPerspectiveTable
. Some tests were not fixed due to an outstanding Perspective bug (Unit test describing index behavior bug finos/perspective#2756).The use of PyArrow to underly Perspective’s DataFrame support leads to some semantic changes. PyArrow is much more eager to set columns to
CategoricalDtype
instead ofStringDtype
. It also has different behavior regarding category ordering, and perhaps others still not uncovered after fixing the tests.Let me know where you want to go with this!